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(T) , Abstract 



In this work, we present the first local-decoding algorithm for expander codes. This yields a new family 
of constant-rate codes that can recover from a constant fraction of errors in the codeword symbols, and 
where any symbol of the codeword can be recovered with high probability by reading N E symbols from 
(^ , the corrupted codeword, where N is the block-length of the code. 

Expander codes, introduced by Sipser and Spielman, are formed from an expander graph G — (V, E) 
of degree d, and an inner code of block-length d over an alphabet E. Each edge of the expander graph 
is associated with a symbol in E. A string in E will be a codeword if for each vertex in V, the symbols 
C**) ' on the adjacent edges form a codeword in the inner code. 

We show that if the inner code has a smooth reconstruction algorithm in the noiseless setting, then the 
corresponding expander code has an efficient local-correction algorithm in the noisy setting. Instantiating 
our construction with inner codes based on finite geometries, we obtain novel locally decodable codes 
with rate approaching one. This provides an alternative to the multiplicity codes of Kopparty, Saraf and 
Yekhanin (STOC 11) and the lifted codes of Guo, Kopparty and Sudan (ITCS 13). 
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1 Introduction 

Expander codes, introduced in (32] , are linear codes which are notable for their efficient decoding algorithms. 
In this paper, we show that when appropriately instantiated, expander codes are also locally decodable, and 
we give a sublinear time local-decoding algorithm. 

In standard error correction, a sender encodes a message x £ {0, l} fc as a codeword c £ {0, 1}^, and 
transmits it to a receiver across a noisy channel. The receiver's goal is to recover x from the corrupted 
codeword w. Decoding algorithms typically process all of w and in turn recover all of x. The goal of local 
decoding is to recover only few single bit of x, with the benefit of querying only a few bits of w. The number 
of bits, q, of w needed to recover a single bit x is known as the query complexity, and the important trade-off 
in local decoding is between this quantity and the rate r = k/N of the code. When q is constant or even 
logarithmic in k, the best known codes have rates which tend to zero as N grows. Until recently, there were 
no known locally decodable codes with rate close to one and sublinear locality; to date, there are only two 
constructions known [25, 20 . In this work, we show that expander codes provide a third construction of 
efficiently locally decodable codes with rate approaching one. 

1.1 Notation and preliminaries 

Before we state our main results, we set notation and give a few definitions. We will construct linear codes 
C of length N and message length k, over an alphabet £ = F, for some finite field F. That is, C C ¥ N is 
a linear subspace of dimension k. The rate of C is the ratio r = k/N . We will also use expander graphs: 
we say a d-regular graph G is a spectral expander with parameter A, if A is the second-largest eigenvalue of 
the normalized adjacency matrix of G. Intuitively, the smaller A is, the better connected G is — see [21] for 
a survey of expanders and their applications. For n £ Z, [n] denotes the set {1,2,..., n}. For x,y £ S w , 
A(x,y) denotes relative Hamming distance, x[i] denotes the i th symbol of x, and x\ s denotes x restricted to 
symbols indexed by S C [N]. 

A code (along with an encoding algorithm) is locally decodable if there is an algorithm which can recover 
a symbol x[i] of the message, making only a few queries to received word. 

Definition 1 (Locally Decodable Codes (LDCs)). Let C C E w be a code of size \E\ k , and let E : S fe -> S w 
be an encoding map. Then (C,E) is (q, p) -locally decodable with error probability r\ if there is a randomized 
algorithm R, so that for any w £ E w with A(u>, E(x)) < p, for each i £ [k], 

¥{R(w,i) = x[i]} >1-V, 

and further R accesses at most q symbols of w. 

In this work, we will actually construct locally correctable codes, which we will sec below imply locally 
decodable codes. 

Definition 2 (Locally Correctable Codes (LCCs)). LetC C T, N be a code. ThenC is (q, p)-locally correctable 
with error probability r) if there is a randomized algorithm, R, so that for any w £ F with A(u>, E(x)) < p, 
for each j £ [N] , 

F{R(w,j)=w\j]}>l-r,, 

and further R accesses at most q symbols of w. 

When there is a constant p > and a failure probability r\ = o(l) so that C is (q, /o)-locally correctable 
with error probability 77, we will simply say that C is locally correctable with query complexity q (and 
similarly for locally decodable). 

When C is a linear code, writing the generator matrix in systematic form gives an encoding function 
E : F fc — >• ¥ N so that for every x £ ¥ k and for all i £ [k], E(x)[i] = x[i]. In particular, if C is a (q,p) linear 
LCC, then (E,C) is a (q,p) LDC. Because of this connection, we will focus our attention on creating locally 
correctable linear codes. 



Many LCCs work on the following principle: suppose, for each i £ [N], there is a set of q query positions 
Q(i), which are smooth — that is, each query is almost uniformly distributed within the codeword — and a 
method to determine c[i] from {c[j] : j £ Q(i)} for any uncorrupted codeword c £ C. If g is constant, this 
smooth local reconstruction algorithm yields a local correction algorithm: with high probability none of the 
locations queried are corrupted. When q is merely sublinear in N, as is the case in this work, this reasoning 
fails. This work demonstrates how to turn codes which only possess a local reconstruction procedure (in the 
noiseless setting) into LCCs with constant rate and sublinear query complexity. 

Definition 3 (Smooth reconstruction). For a code C C Y. N , consider a pair of algorithms (Q,A), where 
Q is a randomized query algorithm with inputs in [N] and outputs in 2 , and A : S 9 x [N] — > S is a 
deterministic reconstruction algorithm. We say that (Q, A) is a s-smooth local reconstruction algorithm 
with query complexity q if the following hold. 

1. For each i £ [N], the query set Q(i) has \Q(i)\ < q. 

2. For each i £ [N], there is some set S C [N] of size s, so that each query in Q(i) is uniformly distributed 
in S. 

3. For all i £ [N] and for all codewords c £ C, A(c\ ci , i s , i) — c[i\. 

If $ = N, then we say the reconstruction is perfectly smooth, since all symbols are equally likely to be 
queried. Notice that the queries need not be independent. In this work, the codes we will consider decode a 
symbol indexed by x £ F m by querying random subspaces through x (but not x itself), and thus will have 
s = N-l. 

1.2 Related work 

The first local-decoding procedure for an error-correcting code was the majority-logic decoder for Reed- 
Muller codes proposed by Reed [31] . Local-decoding procedures have found many applications in theoretical 
computer science including proof-checking [26l[4j[30] and self-testing [lOl HU H3 [18] . While these applications 
implicitly used local-decoding procedures, the first explicit definition of locally decodable codes did not appear 
until later [23]. An excellent survey is available [37]- The study of locally decodable codes focuses on the 
trade-off between rate (the ratio of message length to codeword length) and query complexity (the number 
of queries made by the decoder). Research in this area is separated into two distinct areas: the first seeks to 
minimize the query complexity, while the second seeks to maximize the rate. In the low-query-complcxity 
regime, Yekhanin was the first to exhibit codes with a constant number of queries and a subexponcntial rate 
[55] , Following Yekhanin's work, there has been significant progress in constructing locally decodable codes 
with constant query-complexity [36l [14j [13l [9] [23l [12j [8] [15] . On the other hand, in the high-rate regime, 
there has been less progress. In 2011, Kopparty, Saraf and Yekhanin introduced multiplicity codes, the first 
constant-rate codes with a sublinear local-decoding algorithm [25]. Like Reed-Muller codes, multiplicity 
codes treat the message as a multivariate polynomial, and create codewords by evaluating the polynomial 
at a sequence of points. Multiplicity codes are able to improve on the performance of Reed-Muller codes by 
also including evaluations of the partial derivatives of the message polynomial in the codeword. A separate 
line of work has developed high-rate locally decodable codes by "lifting" shorter codes ^20J. The work of 
Guo, Kopparty and Sudan takes a short code Cq of length <?*, and lifts it to a longer code C, of length N > q l 
over W q , such that every restriction of a codeword in C to an affine subspace of dimension t yields a codeword 
in Cq. The definition provides a natural local-correcting procedure for the outer code: to decode a symbol of 
the outer code, pick a random affine subspace of dimension t that contains the symbol, read the coordinates 
and decode the resulting codeword using the code Co- Guo, Kopparty and Sudan show how to lift explicit 
inner codes so that the outer code has constant rate and query complexity N e . 

In this work, we show that expander codes can also give locally decodable codes with rate approaching 
one, and with query complexity N e . Expander codes, introduced by Sipser and Spielman [32], are formed 
by choosing a d-regular expander graph, G on n vertices, and a code Co of length d (called the inner code), 
and defining the codeword to be all assignments of symbols to the edges of G so that for every vertex in G, 



its edges form a codeword in Co- The connection between error-correcting codes and graphs was hrst noticed 
by Gallager [16] who showed that a random bipartite graph induces a good error-correcting code. Gallager's 
construction was refined by Tanner [34) , who suggested the use of an inner code. Sipser and Spielman |32j 
were the first to consider this type of code with an expander graph, and Spielman [3 3) showed that these 
expander codes could be encoded and decoded in linear time. Spielman's work provided the first family of 
error-correcting code with linear-time encoding and decoding procedures. The decoding procedure has since 
been improved by Barg and Zemor [38l [5] [6] [7] . 

1.3 Our approach and contributions 

We show that certain expander codes can be efficiently locally decoded, and we instantiate our results to 
obtain novel families of (N e , ,o)-LCCs of rate 1 — a, for any sufficiently small constants a, e and some positive 
constant p. Our decoding algorithm runs in time linear in the number of queries, and hence sublinear in 
the length of the message. We provide a general method for turning codes with smooth local reconstruction 
algorithms into LCCs: our main result, Theorem [SJ states that as long as the inner code Co has rate at least 
1/2 and possesses a smooth local reconstruction algorithm, then the corresponding family of expander codes 
are constant rate LCCs. In Section [3J we give some examples of appropriate inner codes, leading to the 
parameters claimed above. 

In addition to providing a sublinear time local decoding algorithm for an important family of codes, 
our constructions are only the third known examples of constant rate LDCs, after multiplicity codes |25j 
and lifted Reed-Solomon codes [20]. Our approach (and the resulting codes) are very different from earlier 
approaches. Both multiplicity codes and lifted Reed-Solomon codes use the same basic principle, also at 
work in Reed-Muller codes: in these schemes, for any two codewords c\ and C2 which differ at index i, the 
corresponding queries ciU,^ and C2\nt i - ) differ in many places. Thus, if the queries are smooth, with high 
probability they will not have too many errors, and the correct symbol can be recovered. In contrast, our 
decoder works differently: while our queries are smooth, they will not have this distance property. As we 
will see, changing a mere \og{q) out of our q queries may change correct answer. The trick is that these 
problematic error patterns must have a lot of structure, and we will show that they are unlikely to occur. 

Finally, our results port a typical argument from the low-query regime to the high-rate regime. As 
mentioned above, when the query complexity q is constant, a smooth local reconstruction algorithm is 
sufficient for local correctability. However, this reasoning fails when q grows with N. In this paper, we show 
how to make this argument go through: via Theorem [5] any family of codes Co with good rate and a smooth 
local decoder can be used to obtain a family of LCCs with similar parameters. 

2 Local correctability of expander codes 

In this section, we give an efficient local correction algorithm for expander codes with appropriate inner 
codes. We use a formulation of expander codes due to [38]. Let G be a d- regular expander graph on n 
vertices with parameter A. We will take G to be a Ramanujan graph, that is, so that A < j ; explicit 
constructions of Ramanujan graphs are known [37] [25J [53] for arbitrarily large values of d. Let H be the 
double cover of G. That is, H is a bipartite graph whose vertices V(H) are two disjoint copies Vb and V\ of 
V(G), and so that 

E(H) = {(uo,vi) : (u,v)GE(G)}, 

where Ui denotes the copy of u in Vi. Fix a linear inner code Cq over E of rate r and relative distance 5q. 
Let N = nd. For Vi £ V(H), let E(vi) denote the edges attached to v. The expander code C C E w of length 
N arising from G and Co is given by 

C = C N (C ,G) = {i€S": x\ E(vi) £ Co for all Vi £ V(H)} (1) 

The following theorem shows that as long as the inner code Co has good rate and distance, so does the 
resulting code C. 



Theorem 4 f |341 132] ). The code C has rate r = 2ro — 1, and as long as 2A < 5q, the relative distance of C 
is at least Sq/2. 

2.1 Local Correction 

If the inner code Co has a smooth local reconstruction procedure, then not only does C have good distance, 
but in fact it is also efficiently locally decodable. Our main result is the following theorem. 

Theorem 5. Let Co be a linear code over £ of length d and rate rg > 1/2. Suppose that Co has a sq -smooth 
local reconstruction procedure with query complexity qo. Let C — Cn(Cq, G) be the expander code of length N 
arising from the inner code Co and a Ramanujan graph G. Choose any 7 < 1/2 and any Q > satisfying 

7 ( e ^<7o) > 8A. Then C is (g, p)-locally correctable, for any error rate p, with p < 7 (e^go) — 2A. The 

success probability is 

■ N \-1/Wd/*) 



and the query complexity is 
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Further, when the length of the inner code, d, is constant, the correction algorithm runs in time 0([E\ q o + q), 
where q' — g + (d — sq). 

Remark 1. We will choose d (and hence q < d) and |E| to be constant. Thus, the rate of C, as well as the 
parameters p and e, will be constants independent of the block length N . The parameter Q trades off between 
the query complexity and the allowable error rate. When qo is much smaller than d (for example, qo = 3 and 
d is reasonably large), we will want to take £ = O(l). On the other hand, if qo — d £ and d is chosen to be a 
sufficiently large constant, we should take £ ~ ln(qo). 

Before diving into the details, we outline the correction algorithm. First, we observe that it suffices to 
consider the case when Qo is perfectly smooth: that is, the queries of the inner code are uniformly random. 
Otherwise, if Qo is so-smooth with qo queries, we may modify it so that it is <i-smooth with qo + (d — sq) 
queries, by having it query extra points and then ignore them. Thus, we set q = qo and assume in the 
following that Qo makes go perfectly smooth queries. 

Suppose that Co has local reconstruction algorithm (Qo,Ao), and we receive a corrupted codeword, w, 
which differs from a correct codeword c* in at most a p fraction of the entries. Say we wish to determine 
c*[(uo, Vi)}, for (uo,vi) <G E(H). The algorithm proceeds in two steps. The first step is to find a set of 
about N £ ' 2 query positions which are nearly uniform in [N], and whose correct values together determine 
c*[(uo, vi)]. The second step is to correct each of these queries with very high probability — for each, we will 
make another N £ ' 2 or so queries. 

Step 1. By construction, c*[(zto,fi)] is a symbol in a codeword of the inner code, Co, which lies on the 
edges emanating from uq. By applying Qo, we may choose go of these edges, S = I (uo, s\ ) : i € [go] \, so 
that 

Ao(c*\ s ,(uo,Vi)) = c[(u ,vi)]. 

Now we repeat on each of these edges: each (uo,s-f ) is part of a codeword emanating from s\ , and so go 
more queries determine each of those, and so on. Repeating this L\ times yields a go-ary tree T of depth 
L\, whose nodes are labeled by of edges of H . This tree-making procedure is given more precisely below in 
Algorithm [2l Because the queries are smooth, each path down this tree is a random walk in H ; because G 
is an expander, this means that the leaves themselves, while not independent, are each close to uniform on 
E{H). Note that at this point, we have not made any queries, merely documented a tree, T, of edges we 
could query. 



Step 2. Our next step is to actually make queries to determine the correct values on the edges represented 
in the leaves of T. By construction, these values determine c*[(uq,vi)]. Unfortunately, in expectation a p 
fraction of the leaves are corrupted, and without further constraints on Co, even one corrupted leaf is enough 
to give the wrong answer. To make sure that we get all of the leaves correct, we use the fact that each leaf 
corresponds to a position in the codeword that is nearly uniform (and in particular nearly independent of 
the location we are trying to reconstruct). For each edge, e, of H that shows up on a leaf of T, we repeat 
the tree-making process beginning at this edge, resulting in new (fa-ary trees T e of depth L 2 - This time, we 
make all the queries along the way, resulting in an evaluated tree r e , whose nodes are labeled by elements of 
S; the root of r e is the e-th position in the corrupted codeword, w[e], and we hope to correct it to c*[e\. 

For a fixed edge, e, on a leaf of T, we will correct the root of r = r e with very high probability, large 
enough to tolerate a union bound over all the trees T e . For two labelings a and v of the same tree by elements 
of E, we define the distance 

D(a,u) =maxA((j| P! u\ P ), (2) 

where the maximum is over all paths P from the root to a leaf, and o~\ P denotes the restriction of a to P. 
We will show below in Section 12.21 that it is very unlikely that r contains a path from the root to a leaf 
with more than a constant fraction a < 1/2 of errors. Thus, in the favorable case, the distance between the 
correct tree r* arising from c* and the observed tree r is at most D(t*,t) < a. In contrast, we will show 
that if a* and r* are both trees arising from legitimate codewords with distinct roots, then a* and r* must 
differ on an entire path P, and so D(a* , r) > 1 — a. To take advantage of this, we show in Algorithm [3] how 
to efficiently compute 

Score(a) = min D(<t*,t) 

cr* :root(<7*)— a 

for all a, where root(cr*) denotes the label on the root of a* . The above argument (made precise below in 
Section I2.2J) shows that there will be a unique a e S with score less than a, and this will be the correct 
symbol c* [e] . 

Finally, with all of the leaves of T correctly evaluated, we may use Aq to work our way back up T and 
determine the correct symbol corresponding to the edge at the root of T. The complete correction algorithm 
is given below in Algorithm [T] 

Algorithm 1: correct: Local correcting protocol. 

Input: An index eo G E{H), and a corrupted codeword w € T, E< - H ' . 

Output: With high probability, the correct value of the eo'th symbol. 

Set L\ = log(<7o)/log(d/4) and a parameter L 2 

T = makeTree(eo,I/i) 

for each edge e of H that showed up on a leaf of T do 

T e = makeTree(e,Z/2) 

Let T e = T e \ w be the tree of symbols from w 

w*[e] = correctSubtree(r e ) 

Initialize a qo~a.Ty tree r* of depth L\ 

Label the leaves of r* according to T and w*: if a leaf of T is labeled e, label the corresponding leaf of 

t* with w*[e]. 

Use the local reconstruction algorithm Aq of Co to label all the nodes in r* 

return The label on the root of r* 

The number of queries made by Algorithm [T] is 

q = Qo 1+L2 (3) 

and the running time is O(td\'S\ q0+1 q), where td is the time required to run the local correction algorithm of 
Cq. For us, both d and |2| will be constant, and so the running time is 0(q). 



Algorithm 2: makeTree: Uses the local correction property of Co to construct a tree of indices. 
Input: An initial edge eo = {u , vi) € E(H), and a depth L. 

Output: A go-ary tree T of depth L, whose nodes are indexed by edges of H, with root eo 
Initialize a tree T with a single node labeled eo 
s = 
for ^ G [L] do 

Let leaves be the current leaves of T 
for e = (u s ,i>i_ s ) € leaves do 

Let -^ «i_ a : « € [d] f be the neighbors of u s in 7? 

Choose queries Qo(e) C ■{ (u s , u}_ s ) : i G [d] f , and add each query in T as a child at e. 

s = 1 — s 
return T 



Algorithm 3: correctSubtree: Correct the root of a fully evaluated tree r. 
Input: r, a qo-axy tree of depth L whose nodes are labeled with elements of S. 
Output: A guess at the root of the correct tree r. 
For a node x of r, let t[x] denote the label on x. 
for leaves x of r anrf a G S do 

best a (,) = (; T[ ^ a 

10 r[a;J = a 

tor £ = L-l,L-2,...,0do 

for nodes x at level £ in r and a G S do 
Let 2/1 , . . . , y qo be the children of x 

Let S a C E 90 be the set of query responses for the children of x so that Aq returns a on those 
responses 
best a (a;) = min (a0] ... j0go ) egti max re[go] (best Qr (j/ r ) + l T[yr) ^ ar ) 



Let r be the root of r 
for a G S do 



best a (r) + l T(r w a 
Score(a) = 



return a G S wzi/i i/ie smallest Score(a) 



2.2 Proof of Theorem [5] 

Suppose that c* G C, and Algorithm Q] is run on a received word w with A(c*,ui) < p. To prove Theorem 
[5l we must show that Algorithm [T] returns c*[eo] with high probability. As remarked above, we assume that 
Qo is perfectly smooth. 

We follow the proof outline sketched in Section 12.11 which rests on the following observation. 

Proposition 6. Let C\,C2 G C and let e G E{H) so that C\[e] ^ C2[e]. Let the distance D between trees 
with labels in £ be as in ^. Let T = makeTree(e), and let r ~ T\ c and a — T\ c be the labeled trees 
corresponding to c\ and C2 respectively. Then D(r,a) = 1. That is, there is some path from the root to the 
leaf of T so that r and a disagree on the entire path. 

Proof. Since ci[e] 7^ C2[e], r and er have different symbols at their root. Since the labels on the children of 



any node determine the label on the node itself (via the local correction algorithm), it must be that r and 
a differ on some child of the root. Repeating the argument proves the claim. □ 

In particular, when r e is the tree arising from the received word w, starting at e, as in Algorithm [1] Let 

T e = { makeTree(e)| c : c e C} 

be the set of query trees arising from uncorrupted codewords, and let r* G T e be the "correct" tree, corre- 
sponding to the original uncorrupted codeword c*. Suppose that 

D{r e ,rt)<a (4) 

for some a £ [0, 1/2). Then Proposition |6] implies that for any a* e 6 T e with a different root from r* has 

D( Te ,a*)>l-a. (5) 

Indeed, there is some path along which r* and a* differ in every place, and along this path, T e agrees with 
r* in at least a 1 — a fraction of the places. Thus, r e disagrees with c* in those same places, establishing 
(JS|). Consider the quantity 

Score(a) = min D(r e ,a*). (6) 

er*£:71;:root(f7* )— a 

Equations [4] and [5] imply that if a* is the label on the root of r* , then Score(a) < a, and otherwise, 
Score(a) > 1 — a. Thus, to establish the correctness of Algorithm [l] it suffices to argue first that Algorithm 
|3] correctly computes Score(a) for each a, and second that (J4J holds for all trees r e in Algorithm [Q 

The first claim follows by inspection. For a node x € r e , let (r e ) x denote the subtree below x. Let Te 
denote the set of trees in Te so that the node x is labeled a. Throughout Algorithm [Q the quantity best a (x) 
gives the distance from the observed tree rooted at x to the best tree in T ei rooted at x, with the additional 
restriction that the label at x should be a. That is, 

best a (x)= min D {{a* e ) x , (r e )J , (7) 

where D is the same as D except it does not count the root, and it is not normalized. It is easy to see that 
is satisfied for leaves x of r e . Then for each node, Algorithm [3] updates best a (x) by considering the best 
labeling on the children of x consistent with t(x) — a, taking the distance of the worst of those children, 
and adding one if necessary. 

To establish the second claim, that (U]) holds for all trees r e , we will need the following lemma about 
random walks on H. 

Lemma 7. Let G and H be as above, and suppose p > 6A. Let Vo,...,vl be a random walk of length L on 
H , starting from the left side at a vertex chosen from a distribution v with \\i/ — —l n \\ < -j=. Let X denote 
the number of corrupted edges included in the walk, and let p + 2A < a < 1/2. Then 

P{X> aL} < exp (-L D (a\ \p + 2A)) . 

In particular, when p + 2A < ln(l/(l — a)), we have 

P{X>aL}<(P±l^ L 
\ a 

Lemma [7] says that a random walk on H will not hit too many corrupted edges, which is very much like 
the expander Chernoff bound [221 [19]. In this case, H is the double cover of an expander, not an expander 
itself, and the edges, rather than vertices, are corrupted, but the proof remains basically the same. For 
completeness, we include the proof in the appendix. The conditions on p and A in the statement of Theorem 
[5] implies that p > 6A, and so Lemma [7] applies to random walks on H . 



Suppose that L\ is even, and consider any leaf of T. This leaf has label (u( h v\) £ E(H), where u is 
the result of a random walk of length L\ on G and v is a randomly chosen neighbor of u. Because G is a 
Ramanujan graph, the distribution /iona satisfies 



as long as 



1 

-lr 

n 



Li> 



Jn 



log(n) 



log(d/4) 

Thus, Lemma [7] applies to random walks in H starting at e. Fix a leaf of r e ; by the smoothness of the query 
algorithm Qoj each path from the root to the leaf of each tree r e is a uniform random walk, and so with 
high probability, the number of corrupted edges on this walk is not more than aZ^j which was the desired 
outcome. 

Finally, we union bound over q 1 trees r e and q 2 paths in each tree. We will set L2 = CL\, for a 
constant C to be determined. Thus, (QJ holds (and hence Algorithm [T] is correct) except with probability at 
most 

P {Algorithm [Q fails} < exp ( (C + l)Ii ln(q ) - CaL x In , 

\ \p + 2A 

The assumption that p < z — 2 A implies that we may choose 

c _ ln(go) + 1 < ln(go) + 1 



ln(<z )-aln(^) 



and the failure probability is at most exp(— Li). From (|3|), q — q a , which completes the proof of 

Theorem [5j 

3 Examples 

In this section, we provide two examples of choices for Co, both of which result in (N e , p)-LCCs of rate 1 — a 
for any sufficiently small constants e,a > and for some constant p > 0. Our first and main example is a 
generalization of Reed-Muller codes, based on finite geometries. With these codes as Co, we provide LCCs 
over ¥ p — unlike multiplicity codes, these codes work naturally over small fields. 

Our second example comes from the observation that if the Co is itself an LCC (of a fixed length) our 
construction provides a new family of (N e , p)-LCCs. In particular, plugging the multiplicity codes of [25 
into our construction yields a novel family of LCCs. This new family of LCCs has a very different structure 
than the underlying multiplicity codes, but achieves roughly the same rate and locality. 

Codes from Afflne Geometries. One advantage of our construction is that the inner code Co need not 
actually be a good locally decodable or correctable code. Rather, we only need a smooth reconstruction 
procedure, which is easier to come by. One example comes from affine geometries; in this example, we will 
show how use Theorem[5]to make LCCs of length N, rate 1 — a and query complexity N e , for any sufficiently 
small a,e > 0. 

For a prime power h = p l and parameters r and m, consider the r-dimensional affine subspaces L\, . . . , L t 
of the vector space F™ . let H be the t x h m incidence matrix of the Li and the points of F™ , and let A* (r, m, h) 
be the code over F p whose parity check matrix is H. These codes, examples of finite geometry codes are 
well-studied, and their ranks can be exactly computed — see [SJ |3] for an overview. 

The definition of of A* (r, m, h) gives a reconstruction procedure: we may query all the points in a random 
r-dimensional affine subspace of F™ and use the corresponding parity check. In particular, if we index the 
positions of the codeword by elements of F™. Then given the position x € F™, the query set Q(x) is all the 



points other than a; in a random r-flat L that passes through x. Given a codeword c € A* (r, m, h), we may 
reconstruct c x by 

a ( c \q( X )) =- E °v 

yeQ(x) 

By definition, (A, Q) is a smooth reconstruction procedure which makes h r queries. 

The locality of A* (r, m, h) has been noticed before, for example in [2D], where it was observed that these 
codes could be viewed as lifted parity check codes. However, as they note, these codes do not themselves 
make good LCCs — the reconstruction procedure cannot tolerate any errors in the chosen subspace, and thus 
the error rate p must tend to zero as the block length grows. Even though these codes are not good LCCs, 
we can use them in Theorem [5] to obtain good LCCs with sublinear query complexity, which can correct a 
constant fraction of errors. We will use the bound on the rate of A*(l, m, h) from [2D] : 

Lemma 8 (Lemma 3.7 in 20 ). Choose £ = em, with h — p e as above. The dimension of A*(l,m,h) is at 
least h m - h m ^^, for /3 = fi{e') = n(2- 2 / £ '). 

We will apply Lemma [8] with 



^ and m~ ' H " /a) 



2 \e'l3{e')\n{ P y 

to obtain a p-ary code Co of length d = p £ m with rate tq at least 1 — a/2 and which has a (d — l)-smooth 
reconstruction algorithm with query complexity qo = d E . To apply Theorem[5] fix any e,a > 0, sufficiently 
small. We set ( — 21n(g ), and choose a = 1/4 in Theorem and use Co- the resulting expander code C 
has rate 1 — a and query complexity 

,'iV N " 

q< 



for sufficiently large d. Finally, using the fact that A < 2/yfd, we see that C corrects against a p fraction of 
errors, where 

again for sufficiently large d, as long as e < 1/12. Assuming e and a are small enough that d is a suitably 
large constant, this rate p is a positive constant, and we achieve the advertised results. 

Multiplicity codes. Multiplicity codes [5S] are themselves a family of constant-rate locally decodable 
codes. We can, however, use a multiplicity code of constant length as the inner code Cq in our construction. 
This results in a new family of constant-rate locally decodable codes. The parameters we obtain from this 
construction are slightly worse than the original multiplicity codes, and the main reason we include this 
example is novelty — these new codes have a very different structure than the original multiplicity codes. 

For constants a' , e' > 0, the multiplicity codes of [25] have length d and rate tq — 1 — a' and a (d — 1)- 
smooth local reconstruction algorithm with query complexity qo = 0(d e ). To apply Theorem [5] we will 
choose C = Cln(go) for a sufficiently large constant C, and so the query complexity of C will be 

for an arbitrarily small constant j3. Thus, setting e = e'(l + f3), and a — 2a' , we obtain codes C with rate 
1 — e and query complexity (N/d) 6 . As long as e is sufficiently small, C can tolerate errors up to p — C'd~ c 6 
for constants C" and C" (depending on the constants in the constructions of the multiplicity code, as well 
as on C above). Multiplicity codes require sufficiently large block length d, on the order of 

'\ g (± 

V ae 



Choosing this d results in a requirement p < l/poly(ae). We remark that the distance of the multiplicity 
codes is on the order of So = £l(a 2 s), and so the distance of the resulting expander code C is il(a 4 e 2 ). 

4 Conclusion 

In the constant-rate regime, all known LDCs work by using a smooth local reconstruction algorithm. When 
the locality is, say, three, then with very high probability none of the queried positions will be corrupted. 
This reasoning fails for constant rate codes, which have larger query complexity: we expect a p fraction of 
errors in our queries, and this is often difficult to deal with. In this work, we have shown how to make the 
low-query argument valid in a high-rate setting — any code with large enough rate and with a good local 
reconstruction algorithm can be used to make a full-blown locally correctable code. 

This work presented the first sublinear time algorithm for locally correcting expander codes. More 
precisely, we have shown that as long as the inner code Co admits a smooth local reconstruction algorithm 
with appropriate parameters, then the resulting expander code C is a (N e , / o)-LCC with rate 1 — a, for 
any a, e > and some constant p. Further, we presented a decoding algorithm with runtime linear in the 
number of queries. There are only two other constructions known in this regime, and and our constructions 
are substantially different. Expander codes are a natural construction, and it is our hope that the additional 
structure of our codes, as well as the extremely fast decoding time, will lead to new applications of local 
decodability 
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A Proof of Lemma \7\ 

In this appendix, we provide a proof of Lemma [7J The lemma follows with only a few tweaks from standard 
results. The only differences between this and a standard analysis of random walks on expander graphs 
are that (a) we are walking on the edges of the bipartite graph H, rather than on the vertices of G, and 
(b) our starting distribution is not uniform but instead close to uniform. Dealing with this differences is 
straightforward, but we document it below for completeness. 

First, we need the relationship between a walk on the edges of a bipartite graph H and the corresponding 
walk on the vertices of G. For ease of analysis, we will treat H as directed, with one copy of each edge in 
each direction. 

Lemma 9. Let G be a degree d undirected graph on d vertices with normalized adjacency matrix A, and let 
H be the double cover of G. For each vertex v of G, label the edges incident to v arbitrarily, and let v(i) 
denote the i th edge of v. Let H' be the graph with vertices V(G) x [d] x {0, 1} and edges 

E(H') = {((u, i, b), (v,j, b')) : (u, v) £ E(G), b ? b' , u(i) = v} . 

Then H' is a directed graph with 2dn edges, and in-degree and out-degree both equal to d. Further, the 
normalized adjacency matrix A' is given by 

A' = R®S 



where S : R 2 -> R 2 is S 



1 

1 



and R : R — > R is an operator with the same rank and spectrum as A. 



Proof. We will write down A' in terms of A. Index [n] by vertices of V, so that e v £ R™ refers to the standard 
basis vector with support on v. Let (8> denote the Kronecker product. We will need some linear operators. 

Let B : R™ 2 ->• R™ 2 so that 

B(e u <g> e v ) — e v <g> e v 
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and P : R" 2 -> K nd so that 

\e u ®e l v = u(i) 

^^JO («,«)^(G)- 

Finally, let 5 : M 2 — > M. 2 be the cyclic shift operator. Then a computation shows that the adjacency matrix 
A' of H' is given by 

(P(I <g> A)BP T ) <g> S. 

Let R = P(I<S> A)BP T . To see that the rank of R is at most n, note that for any i G [d] and any u G V(G), 

i?(e u <8> ej) = e„(j) <g> -1,*. 

In particular, it does not depend on the choice of j. Since {e u <E> ej : w € V(G?), j G [d]} is a basis for R rad , 
the image of i? has dimension at most n. Finally, a similar computation shows that if p is an eigenvector of 
A with eigenvalue A, then p (g> ^ld is a right eigenvector of R, also with eigenvalue A. (The left eigenvectors 
are P(— 1„ (Sip)). This proves the claim. □ 

With a characterization of A' in hand, we now wish to apply an expander Chernoff bound. Existing 
bounds require slight modification for this case (since the graph H' is directed and also not itself an expander), 
so for completeness we sketch the changes required. The proof below follows the strategies in [1] and [22] . 
We begin with the following lemma, following from the analysis of pQ. 

Lemma 10. Let G and H be as in Lemma and let vo,v±, . . . , vt be a random walk on the vertices 
of H , beginning at a vertex of H , chosen as follows: the side of H is chosen according to a distribution 
(To = (s, 1 — s), and the vertex within that side is chosen independently according to a distribution v with 
\\v — — l n \\2 < -t=- Let W be any set of edges in H , with \W\ < pnd. Suppose that p > 6A. Then for any set 
SC{0,1,...,T-1}, 

P{(«t,vt+i) G Wyt e S} < (p + 2A)l s l. 

Proof. As in Lemma [21 we will consider H as directed, with one edge in each direction. As before, we will 
index these edges by triples (u, i, t) G V(G) x [d] x {0, 1}, so that (u, i, £) refers to the i th edge leaving vertex 
u on the £ th side of H. Let /i be the distribution on the first step (i>o, V\) of the walk, so 

1 

U = v <g> -l d <g) er . 
d 

Let M G M. 2nd be the projector onto the edges in W. Let M (°' be the restriction to edges emanating from 
the left side of H, and M^ from the right side, so that both M^ and M^ are nd x nd binary diagonal 
matrices with at most pnd nonzero entries. Let A' = R ® S be as in the conclusion of Lemma [9] After 
running the random walk for T steps, consider the distribution on directed edges of H, conditional on the 
bad event that (vt,v t +i) G W for all t G S. As in the analysis in [I], this distribution is given by 

_ (M Tl A')(M T -2A') ■ ■ ■ (Mi A 7 ) (Mqm) 

Mt P{(v u v t+1 )ewyteS} 

where 

M t ={ M ^ S . 
\l t&S 

Since the l\ norm of any distribution is 1, we have 

P{(v t ,v t+1 ) EW,VteS} = \\{M T ^A'){M T ^A') ■ ■ ■ (M 1 A')(A/ 0/ i)|| 1 (8) 

Let 

Ho := M p, 
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and 



Pt ■= MtA'fH-i, 



so we seek an estimate on ||/Ur|li- 

The following claim will be sufficient to prove the theorem. 



Claim 11. If p > 6A, and t E S, 



On the other hand, if t g - S , 



(p-2X)\\p t \\ 1 <\\pt + i\\ 1 <(p + 2X)\\pt\\ 1 . 



IImJi = llMi+illi 



The second half of the claim follows immediately from the definition of pt . To prove the first half, suppose 
that t G S. We will proceed by induction. Again, we follow the analysis of [T]. 

Write po = vq (3 uq, and write <To = (s, 1 — s) Part of our inductive hypothesis will be that for all t, 



,(0) 



pt = vl <g> s t e + v\ <g> (1 - s t )e 



(i) 



where s t = s if t is even and 1 — s if t is odd, and where v t £ M. nd . For i & {0, 1}, write 



.(*) 



(i) , (*) 



where x\ ||1 and y t _L 1. The second part of the inductive hypothesis will be 



(9) 



for a parameter q to be chosen later, and for i G {0, 1}. 
Because 



||Mt||i = *tlK (0) ||i + (i-^)IK (1) lli 
= St ||^ 0) || 1 + (i- St )||4 1) || 1 



nd(*t||xi 0) ||2 + (l-* t )||4 1) ll2) 



it suffices to show that 



{p - 2A) 



,(°) 



< 



r (D 
l t+i 



<{p + 2A) 



,(°) 



(10) 



and similarly with the and 1 switched. The analysis is the same for the two cases, so we just establish 
(fT0|) . Using the decomposition A' — R ® S from Lemma [9l 

Pt+i = M t (R^S){vl 0) ® 5 * e o + «t (1) ® (1 - s*)ei) 
= Af t (i?w t (0) (8 (1 - «t+i)ei + Rv ( t 1] ® st+ieo) 



= (M t (1) Jto t (0) ) ® (1 - St+1 ) ei + (M^RvP 
This establishes the first inductive claim about the structure of pt+i, and 



s*+i e o 



„(0) 



r(°) »„,(i) 



ifcft = M t w fli;} 



and 



.,(!) 



rd) p„,(°) 



ifcft - M^Rvl 



Consider just v t+1 . We have 



««=M«i?(zf+ y f). 
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Because t £ S, we know that Mj: is diagonal with at most pnd nonzeros, and further we know that R 
has second normalized eigenvalue at most A, by Lemma |9] The analysis in |T[ now shows that, using the 
inductive hypothesis ((9J, 

Pll^ 0) ll2 - «Av^(T^7)]]^ 0) || 2 < H^Vills < P||^ C0) IU -J- Q Av^(r^^)||^ 0) || 2 , (11) 

and that 

We must ensure that (j9]) is satisfied for the next round. As long as A < p/6, this follows from the above 
when 

With this choice of q, the (|10p follows from (fTTj) . Further, the hypotheses on v show that the © is satisfied 
in the initial step. □ 

Finally, we invoke the following theorem, from |22j . 

Theorem 12 (Theorem 3.1 in |22j). Let X\, . . . , Xi, be binary random variables so that for all S C [L], 



Then for all 7 > 5. 



Lemma [7] follows immediately. 



p{/\x 1 = i\<s^. 

lies J 



e -LD( 7 \\8)_ 



15 



